LRTDP Versus UCT for Online Probabilistic Planning

نویسندگان

Andrey Kolobov

Mausam

Daniel S. Weld

چکیده

UCT, the premier method for solving games such as Go, is also becoming the dominant algorithm for probabilistic planning. Out of the five solvers at the International Probabilistic Planning Competition (IPPC) 2011, four were based on the UCT algorithm. However, while a UCT-based planner, PROST, won the contest, an LRTDP-based system, GLUTTON, came in a close second, outperforming other systems derived from UCT. These results raise a question: what are the strengths and weaknesses of LRTDP and UCT in practice? This paper starts answering this question by contrasting the two approaches in the context of finite-horizon MDPs. We demonstrate that in such scenarios, UCT’s lack of a sound termination condition is a serious practical disadvantage. In order to handle an MDP with a large finite horizon under a time constraint, UCT forces an expert to guess a non-myopic lookahead value for which it should be able to converge on the encountered states. Mistakes in setting this parameter can greatly hurt UCT’s performance. In contrast, LRTDP’s convergence criterion allows for an iterative deepening strategy. Using this strategy, LRTDP automatically finds the largest lookahead value feasible under the given time constraint. As a result, LRTDP has better performance and stronger theoretical properties. We present an online version of GLUTTON, named GOURMAND, that illustrates this analysis and outperforms PROST on the set of IPPC-2011 problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PROST: Probabilistic Planning Based on UCT

We present PROST, a probabilistic planning system that is based on the UCT algorithm by Kocsis and Szepesvári (2006), which has been applied successfully to many areas of planning and acting under uncertainty. The objective of this paper is to show the application of UCT to domainindependent probabilistic planning, an area it had not been applied to before. We furthermore present several enhanc...

متن کامل

Solving Uncertain MDPs by Reusing State Information and Plans

While MDPs are powerful tools for modeling sequential decision making problems under uncertainty, they are sensitive to the accuracy of their parameters. MDPs with uncertainty in their parameters are called Uncertain MDPs. In this paper, we introduce a general framework that allows off-theshelf MDP algorithms to solve Uncertain MDPs by planning based on currently available information and repla...

متن کامل

Probabilistic Temporal Planning

Planning research has explored the issues that arise when planning with concurrent and durative actions. Separately, planners that can cope with probabilistic effects have also been created. However, few attempts have been made to combine both probabilistic effects and concurrent durative actions into a single planner. The principal one of which we are aware was targeted at a specific domain. W...

متن کامل

Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors

In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous state-of-the-art planners (e.g., those from IPPC-2008), which were primari...

متن کامل

ASAP-UCT: Abstraction of State-Action Pairs in UCT

Monte-Carlo Tree Search (MCTS) algorithms such as UCT are an attractive online framework for solving planning under uncertainty problems modeled as a Markov Decision Process. However, MCTS search trees are constructed in flat state and action spaces, which can lead to poor policies for large problems. In a separate research thread, domain abstraction techniques compute symmetries to reduce the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

LRTDP Versus UCT for Online Probabilistic Planning

نویسندگان

چکیده

منابع مشابه

PROST: Probabilistic Planning Based on UCT

Solving Uncertain MDPs by Reusing State Information and Plans

Probabilistic Temporal Planning

Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors

ASAP-UCT: Abstraction of State-Action Pairs in UCT

عنوان ژورنال:

اشتراک گذاری